1 research outputs found

    Distributed Supervised Statistical Learning

    Get PDF
    We live in the era of big data, nowadays, many companies face data of massive size that, in most cases, cannot be stored and processed on a single computer. Often such data has to be distributed over multiple computers which then makes the storage, pre-processing, and data analysis possible in practice. In the age of big data, distributed learning has gained popularity as a method to manage enormous datasets. In this thesis, we focus on distributed supervised statistical learning where sparse linear regression analysis is performed in a distributed framework. These methods are frequently applied in a variety of disciplines tackling large scale datasets analysis, including engineering, economics, and finance. In distributed learning, one key question is, for example, how to efficiently aggregate multiple estimators that are obtained based on data subsets stored on multiple computers. We investigate recent studies on distributed statistical inferences. There have been many efforts to propose efficient ways of aggregating local estimates, most popular methods are discussed in this thesis. Recently, an important question about the number of machines to deploy is addressed for several estimation methods, notable answers to the question are reviewed in this literature. We have considered a specific class of Liu-type shrinkage estimation methods for distributed statistical inference. We also conduct a Monte Carlo simulation study to assess performance of the Liu-type shrinkage estimation methods in a distributed framework
    corecore